Performance Advantage of the Register Stack in Intel® ItaniumTM Processors
نویسندگان
چکیده
The Intel® ItaniumTM architecture provides a virtual register stack of unlimited size for use by software. New virtual registers are allocated on a procedure call and deallocated on return. Itanium processors implement the register stack by means of a large physical register file, a mapping from virtual to physical registers, and a Register Stack Engine (RSE) that saves and restores the contents of the physical registers to memory without explicit program intervention. The combination of these features significantly reduces the number of loads and stores required to save registers across procedure calls compared to a conventional architecture. In this paper, we show that the Itanium register stack reduces load and store traffic to the stack by at least a factor of three across select SpecInt2000 and Oracle database benchmarks. Furthermore, we examine the effects of the register stack on data cache miss rates and program execution time. When compared to a conventional architecture, the Itanium architecture on average achieves 7%-8.3% and 10.2%-12% performance advantage on in-order and out-of-order processor models, respectively, as a result of the register stack. Finally we analyze the vitality of stack loads and show that in general few stack loads are vital in an in-order model. However, a larger percentage of stack loads become vital in the out-of-order model leading to a greater performance benefit from the register stack.
منابع مشابه
Optimizing Intel EPIC/Itanium2 Architecture for Forth
Forth is a stack machine that represents a good match for the register stack of the Explicit Parallel Instruction Computer (EPIC) architecture. In this paper we will introduce a new calling mechanism using the register stack to implement a Forth system more efficiently. Based upon our performance measurements, we will show that the new calling mechanism is a promising technique to improve the p...
متن کاملCompiler Controlled Register Stack Management for the Intel
Intel Itanium processors were designed with an on chip register stack engine (RSE) in order to reduce the overhead related to procedure call boundaries. The RSE automatically preserves values stored in stacked registers across procedure invocations. This architecture model significantly reduces the amount of spill code necessary to maintain an application’s state, which in turn reduces memory t...
متن کاملOptimization for the Intel
The Intel R © Itanium R © architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium R © architecture provides a compilercontrollable virtual register stack to reduce the ...
متن کاملExploiting Intra-function Correlation with the Global History Stack
The demand for more computation power in high-end embedded systems has put embedded processors on parallel evolution track as the RISC processors. Caches and deeper pipelines are standard features on recent embedded microprocessors. As a result of this, some of the performance penalties associated with branch instructions in RISC processors are becoming more prevalent in these processors. As is...
متن کاملBit Swapping Linear Feedback Shift Register For Low Power Application Using 130nm Complementary Metal Oxide Semiconductor Technology (TECHNICAL NOTE)
Bit swapping linear feedback shift register (BS-LFSR) is employed in a conventional linear feedback shirt register (LFSR) to reduce its power dissipation and enhance its performance. In this paper, an enhanced BS-LFSR for low power application is proposed. To achieve low power dissipation, the proposed BS-LFSR introduced the stacking technique to reduce leakage current. In addition, three diffe...
متن کامل